Skip to content

Fix --remote_download_regex matching all files unconditionally#3301

Merged
brentleyjones merged 1 commit intomainfrom
fix/remote-download-regex-match-all
Apr 10, 2026
Merged

Fix --remote_download_regex matching all files unconditionally#3301
brentleyjones merged 1 commit intomainfrom
fix/remote-download-regex-match-all

Conversation

@ra1028
Copy link
Copy Markdown
Member

@ra1028 ra1028 commented Apr 10, 2026

Summary

Fix a bug in generate_index_build_bazel_dependencies.sh where the --remote_download_regex pattern matches all files unconditionally, defeating the purpose of selective downloads during Index Build.

Problem

The current regex:

--remote_download_regex=${indexstores_regex}.*|.*\.(cfg|c|C|cc|...)$

The .*| at the boundary acts as an unbounded wildcard. Due to regex OR semantics, it matches every file before the alternation is even evaluated. This means all remote cache outputs are downloaded locally during Index Build, regardless of file extension.

The intent was to download only indexstores (matched by ${indexstores_regex}) and files with indexing-relevant extensions (.swiftmodule, .swiftdoc, .swift, headers, etc.). Instead, every output, including .o files, intermediate artifacts, and linked binaries, is downloaded.

Fix

Remove the erroneous .*| so the regex correctly constrains downloads to indexstores and listed extensions only:

--remote_download_regex=${indexstores_regex}.*\.(cfg|c|C|cc|...)$

Impact

In a large-scale monorepo, this bug caused Index Build's local storage to grow from an expected few GB to over 1TB, as all remote cache outputs were downloaded unconditionally. With the fix, only indexing-relevant files are downloaded.

Related

@ra1028 ra1028 requested a review from a team as a code owner April 10, 2026 09:38
Copy link
Copy Markdown
Contributor

@adincebic adincebic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

The regex pattern `.*|.*\.(ext1|ext2|...)$` matches everything
because `.*` before `|` is an unbounded wildcard with no anchor.
This makes the extension filter useless, causing all remote cache
outputs to be downloaded.

Remove the `.*|` so only files with the listed extensions (plus
indexstores when enabled) are downloaded.

Signed-off-by: Ryo Aoyama <r.fe51028.r@gmail.com>
@ra1028 ra1028 force-pushed the fix/remote-download-regex-match-all branch from 0a2665d to 357bf2d Compare April 10, 2026 09:49
@ra1028
Copy link
Copy Markdown
Member Author

ra1028 commented Apr 10, 2026

Note: This fix addresses the remote cache download path. For local builds, the storage bloat is primarily caused by linked product binaries produced by the bp output group. See #3302 for a complementary fix that eliminates linking during Index Build.

Copy link
Copy Markdown
Contributor

@brentleyjones brentleyjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix! This clearly regressed in 4a02abb where I only moved part of the regex into the variable.

@brentleyjones brentleyjones merged commit d06beea into main Apr 10, 2026
9 of 11 checks passed
@brentleyjones brentleyjones deleted the fix/remote-download-regex-match-all branch April 10, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants